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Abstract 

Model-based  diagnosis  can  be  formulated  as  the 
combinatorial  optimization  problem  of  finding  an 
assignment  of  behavior  modes  to  all  the  compo- 
nents in  a system  such  that  it  is  not  only  consis- 
tent with  the  system  description  and  observations, 
but  also  maximizes  the  prior  probability  associated 
with  it.  Because  the  general  case  of  this  problem 
is  exponential  in  the  number  of  components,  we 
try  to  leverage  the  structure  of  the  physical  sys- 
tem under  consideration.  Traditional  dynamic  pro- 
gramming techniques  based  on  the  underlying  con- 
straint network  (like  heuristics  derived  from  maxi- 
mum cardinality  ordering)  do  not  necessarily  sup- 
plement or  do  better  than  algorithms  based  on  using 
truth  maintenance  systems  (like  conflict-directed 
best  first  search). 

In  this  paper,  we  compare  the  two  approaches  and 
examine  how  we  can  incorporate  the  dynamic  pro- 
gramming paradigm  into  TMS-based  algorithms  to 
achieve  the  best  of  both  the  worlds.  We  describe 
an  algorithm  called  hierarchical  conflict-directed 
best  first  search  (HCBFS)  to  solve  a large  diag- 
nosis problem  by  heuristically  decomposing  it  into 
smaller  sub-problems.  We  also  delve  into  some  of 
the  implications  of  HCBFS  with  respect  to  (1 ) pre- 
compiling the  system  description  to  a form  that  can 
amortize  the  cost  of  a diagnosis  call  and  (2)  facili- 
tating other  hybrid  techniques  for  diagnosis. 

1 Introduction 

Diagnosis  is  an  important  component  of  autonomy  for  any 
intelligent  agent.  Often,  an  intelligent  agent  plans  a set  of  ac- 
tions to  achieve  certain  goals.  Because  some  conditions  may 
be  unforeseen,  it  is  important  for  it  to  be  able  to  reconfigure 
its  plan  depending  upon  the  state  in  which  it  is.  This  mode 
identification  problem  is  essentially  a problem  of  diagnosis. 
In  its  simplest  form,  the  problem  of  diagnosis  is  to  find  a 
suitable  assignment  of  modes  in  which  each  component  of 
a system  is  behaving  in,  given  some  observations  made  on 
it.  It  is  possible  to  handle  the  case  of  a dynamic  system  by 
treating  the  transition  variables  as  components  (in  one  sense) 
[Kurien  and  Nayak,  2000], 


Definition  (Diagnosis  System):  A diagnosis  system  is  a triple 
(SD,  COMPS , OBS ) such  that: 

1 . SD  is  a system  description  expressed  in  one  of  several 
forms  — constraint  languages  like  propositional  logic, 
probabilistic  models  like  Bayesian  network  etc.  SD  specifies 
both  component  behavior  information  ( SDb ) and  compo- 
nent structure  information  (i.e.  the  topology  of  the  system) 

( SDt ). 

2.  COMPS  is  a finite  set  of  components  of  the  system.  A 
component  compi  (1  < i < \COMPS\)  can  behave  in  one 
of  several,  but  finite  set  of  modes  (Mi).  If  these  modes  are 
not  specified  explicitly,  then  we  assume  two  modes  — failed 
(. AB(compi ))  and  normal  (^AB(cornpi)). 

3.  OBS  is  a set  of  observations  expressed  as  variable  values. 
The  task  of  diagnosis  is  to  “identify”  the  modes  in  which 
individual  components  are  behaving  given  the  system  de- 
scription (SD)  and  the  observations  ( utsS ). 

Definition  (Candidate):  Given  a set  of  integers 

h-"i\coMPS\  (such  that  for  1 < j < \COMPS\ , 

1 < ij  < \Mj\),  a candidate  Cand(ii  ■ ■ ■ i\coMPS\)  is 

defined  as  Cand(ii  ■ i\coMPS\ ) = (Ua2l MPS\compk  = 
Mk(ik))). 

Here,  Mu(v)  denotes  the  vth  element  in  the  set  Mu  (assumed 
to  be  indexed  in  some  way). 

2 Diagnosis  as  Combinatorial  Optimization 

Consider  diagnosing  a system  consisting  of  three  bulbs 
Bi,B2  and  B?>  connected  in  parallel  to  the  same  volt- 
age source  V under  the  observations  of  f(B{),  of  f(B2) 
and  on(B2).  AB(V ) A AB (B:>, ) is  a diagnosis  under  the 
consistency-based  formalization  of  diagnosis  [de  Kleer  et  al., 
1992]  if  we  had  constraints  only  of  the  form  ->AB(B2)  A 
-i ABiy)  — i B$  = on.  Intuitively  however,  it  does  not 
seem  reasonable  because  B:i  cannot  be  on  without  V work- 
ing properly.  One  way  to  get  around  this  is  to  include  fault 
models  in  the  system  [Struss  and  Dressier,  1989].  These  are 
constraints  that  explicitly  describe  the  behavior  of  a compo- 
nent when  it  is  not  in  its  nominal  mode  (most  expected  mode 
of  behavior  of  a component).  Such  a constraint  in  this  exam- 
ple would  be  AB(B3)  off(B3).  Diagnosis  can  become 
indiscriminate  without  fault  models.  It  is  also  easy  to  see 
that  the  consistency-based  approach  can  exploit  fault  models 
(when  they  are  specified)  to  produce  more  intuitive  diagnoses 


(like  only  Bi  and  B2  being  abnormal). 

The  technique  of  using  fault  models  however  is  associated 
with  the  problem  of  being  too  restrictive.  It  may  not  model 
the  case  of  some  strange  source  of  power  making  B3  on  etc. 
The  way  out  of  this  is  to  allow  for  many  modes  of  behavior 
for  the  components  of  the  system.  Each  component  has  a set 
of  modes  with  associated  models  — normal  modes  and  fault 
modes.  Each  component  has  the  unknown  fault  mode  with 
the  empty  model.  The  unknown  mode  tries  to  capture  the 
modeling  incompleteness  assumption  (obscure  modes  that  we 
cannot  model  in  the  system).  Also,  each  mode  has  an  associ- 
ated probability  that  is  the  prior  probability  of  the  component 
behaving  in  that  mode.  Diagnosis  can  now  be  cast  as  a com- 
binatorial optimization  problem  of  assigning  modes  of  behav- 
ior to  each  component  such  that  it  is  not  only  consistent  with 
SD  U OBS,  but  also  maximizes  the  product  of  the  prior  prob- 
abilities associated  with  those  modes  [de  Kleer  and  Williams, 
1989].  Note  that  the  combinatorial  optimization  formulation 
of  diagnosis  assumes  independence  of  the  behavior  modes  of 
components. 

Definition  (Combinatorial  Optimization  Characterization)  A 
candidate  H = Cand(ii  ■ ■ -i\cOMPS\)  is  a diagnosis  if 
and  only  if  SD  U H U OBS  is  satisfiable  and  P(H)  = 

(II MPSlP (compk  = Mk(ik )))  is  maximized. 

There  are  many  other  characterizations  of  diagnoses  based 
on  the  notions  of  abduction,  Bayesian  model  selection,  model 
counting  [Kumar,  2002]  etc.  These  characterizations  (includ- 
ing combinatorial  optimization)  are  mostly  for  choosing  the 
most  likely  diagnosis  and  do  not  incorporate  any  notion  of 
refinement  [Lucas,  1997].  The  combinatorial  optimization 
formulation  to  return  the  most  likely  diagnosis  is  however 
justified,  practical  and  suited  for  a variety  of  real-life  appli- 
cations [Kurien  and  Nayak,  2000].  It  also  benefits  from  the 
availability  of  computationally  efficient  algorithms  to  solve 
combinatorial  optimization  problems  [Williams  and  Nayak, 
1996]. 

3 Computational  Methods 

Definition  (Combinatorial  Optimization  Problem):  A combi- 
natorial optimization  problem  is  a tuple  (F,  /,  c)  where  (1) 
F is  a set  of  discrete  variables  with  finite  domains.  (2)  An 
assignment  maps  each  v in  F to  a value  in  Fs  domain.  (3)  / 
is  a function  that  decides  feasibility  of  assignments.  (4)  c is  a 
function  that  returns  the  cost  of  an  assignment.  (5)  We  want 
to  minimize  c(F)  such  that  /(F)  holds. 

In  the  context  of  diagnosis,  the  following  correspondences 
hold:  (1)  F = COMPS.  (2)  Domains  correspond  to  modes 
of  behavior  of  components  (3)  An  assignment  is  a candi- 
date. (4)  c is  a simple  cost  model  assuming  independence 
in  behavior  modes  of  components  c(compi  = Mj(v))  = 

l °9 ^p(compi=Mi(v^)  ■ Here’  Mi(v*)  is  the  nominal  mode  of 
behavior  of  compf,  P{compi  = Mi(v*))  > P(compi  = 
Mi(v))  for  any  v ^ v* . c(Cand{ii  ■ ■ -i\COMPS\))  = 

jj^=lMFS]  c(compk  = Mfc(ifc)).  (5)  / is  the  satisfiability 
of  SD  U C and(ii  * * * 1\comps\  ) U OBS . 

A brute-force  method  of  solving  such  a problem  is  to  use 
a simple  best  first  search  (BPS)  which  is  clearly  exponen- 


tial in  the  number  of  components.  It  can  however,  be  poten- 
tially improved  by  leveraging  the  structure  of  the  system.  One 
popular  method  of  leveraging  structure  using  the  paradigm 
of  dynamic  programming  is  to  use  heuristics  derived  from  a 
maximum  cardinality  ordering  (m-ordering)  [Tarjan  and  Yan- 
nakakis,  1984]  over  the  constraint  network  relating  the  vari- 
ables of  the  system.  Such  techniques  have  been  used  in  a va- 
riety of  domains  — Bayesian  network  reasoning,  constraint 
satisfaction  problems  [Dechter,  1992]  etc.  A constraint  net- 
work on  the  variables  of  the  system  is  defined  by  having 
the  variables  represent  nodes  and  constraints  in  SD  repre- 
sent hyper-edges.  Any  kind  of  optimization  or  satisfaction 
defined  over  the  variables  can  be  done  in  time  exponential  in 
the  induced  width  of  the  graph  [Dechter,  1992],  Although  the 
induced  width  itself  cannot  be  found  constructively  in  poly- 
nomial time,  heuristics  derived  from  m-ordering  perform  rea- 
sonably well  in  practice.  Throughout  the  rest  of  this  paper, 
we  will  refer  to  all  such  heuristics  as  naive  m-ordering  (naive 
because  they  do  not  supplement  the  power  of  TMS-based  al- 
gorithms). 

These  heuristics  however,  may  not  be  directly  beneficial 
or  applicable  when  the  number  of  components  is  somewhat 
lesser  than  the  total  number  of  variables  in  the  system  (which 
is  usually  the  case).  The  induced  width  of  the  constraint  net- 
work relating  all  the  variables  in  a physical  system  can  easily 
be  much  more  than  the  number  of  components.  A further  dis- 
advantage of  such  approaches  is  that  often  the  relationships 
between  variables  are  too  complex  and  consistency  checks 
may  involve  some  kind  of  a “simulation'’.  Since  dynamic 
programming  techniques  based  on  these  heuristics  maintain 
and  build  partial  assignments,  they  are  very  likely  to  be  costly 
processes.  Furthermore,  in  many  cases,  the  number  of  faulty 
components  is  usually  far  lesser  than  the  total  number  of  com- 
ponents and  these  techniques  do  not  exploit  this  significantly 
towards  computational  gains. 

One  approach  that  addresses  these  problems  some- 
what indirectly  is  conflict-directed  best  first  search  (CBFS) 
[Williams  and  Nayak,  1996].  It  is  based  on  the  idea  of  ex- 
amining hypotheses  in  decreasing  order  of  their  prior  prob- 
abilities and  using  a truth  maintenance  system  (TMS)  to 
catch  minimal  conflicts  and  focus  the  search.  QCBFS  [Ku- 
mar, 2001]  is  an  extension  of  CBFS  that  leverages  qualitative 
knowledge  present  in  the  system.  Because  hypotheses  are  ex- 
amined in  order  of  their  probabilities,  diagnoses  that  entail  a 
nominal  behavior  for  all  but  a few  components  are  caught  as 
soon  as  possible  (unlike  in  the  naive  m-ordering  case). 

A TMS  incorporates  and  uses  the  following  properties:  (1) 
If  a partial  assignment  to  the  mode  behaviors  of  a subset  of 
the  components  is  inconsistent,  then  any  other  assignment 
that  contains  this  subset  unchanged  is  also  inconsistent.  (2) 
Smaller  conflicts  result  in  more  pruning  of  the  search  space 
and  therefore,  whenever  an  assignment  A is  infeasible,  a min- 
imal infeasible  subset  of  A is  returned  (using  dependency 
tracking).  (3)  Since  the  hypotheses  that  we  examine  differ 
only  incrementally  from  one  another  in  the  assignments  for 
behavior  modes  of  components,  feasibility  checks  are  made 
more  efficient  (like  in  1TMS  [Nayak  and  Williams,  1997]). 


Figure  1 : (a)  Shows  the  worst-case  scenario  for  m-ordering. 
(b)  Shows  the  worst-case  scenario  for  CBFS. 

3.1  Comparison  of  naive  m-ordering  and  CBFS 

While  naive  m-ordering  exploits  the  structure  of  the  under- 
lying constraint  network,  it  does  not  exploit  the  fact  that  we 
are  interested  in  an  assignment  only  to  the  components  of  the 
system  (and  not  the  intermediate  variables).  This  becomes 
a liability  especially  when  consistency  checks  involve  “sim- 
ulation” and  are  therefore  costly.  It  performs  badly  when  a 
“small”  number  of  components  are  “tightly”  connected.  Fig- 
ure 1(a)  illustrates  the  bad  behavior  of  m-ordering.  There 
are  4 components  that  can  possibly  behave  in  different  modes 
(Cl,  C2,  C3  andC4).  FI,  F2  andF3  are  not  modeled  as  com- 
ponents but  are  some  complex  mappings  (involving  simula- 
tion) from  their  inputs  to  outputs.  The  number  of  parents  of 
C4  is  equal  to  6 and  the  combinatorial  optimization  problem 
is  exponential  in  this  quantity  [Darwiche,  1998].  A TMS- 
based  algorithm  however,  would  require  only  a search  space 
exponential  in  the  number  of  components  (=4).  This  can  be 
verified  by  noting  that  once  a set  of  modes  is  assumed  for  each 
component  (as  in  a TMS-based  algorithm),  verifying  that  the 
current  set  of  inputs  lead  to  the  observations  is  not  exponen- 
tial but  only  polynomial  in  the  size  of  the  graph.  This  is  be- 
cause any  component  maps  its  inputs  to  a unique  output  and 
we  just  need  to  follow  the  inputs  through  all  the  transforma- 
tions defined  by  the  components  to  eventually  verify  whether 
there  is  a match  with  the  observations.  In  the  case  of  naive  in- 
ordering  however,  combinatorial  optimization  requires  us  to 
compute  and  store  against  all  values  of  communication  vari- 
ables around  a family  (also  called  partition),  the  most  likely 
modes  of  behavior  of  the  components  in  it.  This  makes  it  ex- 
ponential in  the  induced  width  of  the  graph.  It  is  also  easy  to 
see  (as  claimed  earlier)  that  when  the  diagnosis  is  quite  close 
to  the  nominal  behaviors  of  components,  there  is  no  obvious 
way  of  exploiting  it  with  m-ordering. 

CBFS  on  the  other  hand,  exploits  the  fact  that  we  are  inter- 
ested in  an  assignment  only  for  the  components  of  the  system, 
but  does  not  exploit  the  structure  of  the  physical  setting  effi- 
ciently. The  only  indirect  way  in  which  the  structure  comes 
into  play  is  in  the  TMS  implementation  off  to  catch  min- 
imal conflicts.  The  problem  with  CBFS  is  in  large  due  to 
the  fact  that  all  inconsistencies  are  traced  back  to  the  compo- 
nents. This  makes  CBFS  perform  sub-optimally  when  com- 
ponents are  “loosely”  connected.  Figure  1(b)  illustrates  the 
bad  behavior  of  CBFS.  An  observation  of  O = 1 when  C7  is 
an  XOR  gate  entails  the  conflicts  {T 1 = 1,T2  = 1}  and 
{T 1 = 0,T2  = 0}.  Note  that  T1  = 0,  T2  =0,  T1  = 1 or 
T2  = 1 are  not  conflicts  by  themselves.  If  all  inconsisten- 


(A) (B)  (C)  (D) 


Figure  2:  (A)  The  physical  setting.  (B)  The  graph  represen- 
tation. (C)  The  constraint  network.  (D)  The  T-Graph. 

cies  are  traced  back  to  the  components  Cl  - C6  however,  the 
search  space  over  component  behavior  modes  is  never  pruned 
by  a minimal  conflict  of  size  lesser  than  6.  If  on  the  other 
hand,  we  split  the  problem  into  two  (by  treating  the  cases 
{T 1 = 1,T2  = 0}  and  {T 1 = Q,T2  = 1}  separately)  the 
search  space  can  be  reduced  to  being  exponential  in  4 vari- 
ables (rather  than  6). 

4 Hierarchical  Conflict-Directed  Best  First 
Search  (HCBFS) 

Before  we  describe  HCBFS  as  an  algorithm  that  can  combine 
the  best  of  both  the  above  approaches,  we  define  the  follow- 
ing notions  related  to  the  structure  of  a physical  setting. 
Definition  (Structural  Parameter  Set):  The  structural  pa- 
rameter set  S of  a physical  system  is  the  4-tuple  S = 
(C0MPS,I,0,T).  Here,  I is  the  set  of  external  inputs, 
O is  the  set  of  output  variables  under  observation,  and  T is 
the  set  of  intermediate  variables  in  the  system  which  are  not 
under  observation. 

Definition  (Graph  Representation):  The  graph  representa- 
tion of  a physical  system  with  structural  parameter  set  S and 
a topology  characterized  by  SDt  is  a graph  with  nodes  corre- 
sponding to  elements  in  S and  undirected  edges  correspond- 
ing to  physical  connections  inferred  from  SDt- 
Definition  (*-node) : A node  in  the  graph  representation  of  a 
physical  system  is  a c-node,  i-node,  o-node  or  a t-node  when 
it  corresponds  respectively  to  a component,  input  variable, 
output  variable  or  an  intermediate  variable. 

Definition  (T-Graph) : The  T-Graph  of  a physical  system  with 
structural  parameter  set  S and  topology  SDt  is  a graph  built 
out  of  removing  the  c-nodes  from  its  graph  representation  and 
directly  connecting  the  inputs  to  their  outputs  (in  that  direc- 
tion). 

Figure  2 illustrates  the  above  definitions  for  a simple  physi- 
cal setting.  Note  that  the  graph  representation  is  not  the  same 
as  the  constraint  network  specified  by  SD.  While  the  con- 
straint network  is  built  on  the  variables  of  the  system  (ex- 
cluding components)  using  SD,  the  graph  representation  is 
built  only  out  of  SDt  (and  includes  the  components).  The 
T-Graph  represents  the  causal  relationships  among  the  vari- 
ables (excluding  the  components)  and  it  can  be  observed  that 
the  constraint  network  is  equivalent  to  the  T-Graph  moralized 
by  making  a clique  o ut  of  all  the  parents  of  any  node  [Dechter, 
1992], 

Notation:  Let  M ( i ) be  the  set  of  modes  in  which  component 
compi  can  behave.  Let  c*  be  the  cardinality  of  this  set.  Let 
T(i)  be  the  set  of  values  an  intermediate  variable  t-node-i  can 


take.  Let  ti  be  the  cardinality  of  this  set. 

Definition  (c-size) : The  c-size  of  a sub-graph  G is  the  product 
of  the  number  of  modes  in  which  each  component  it  contains 
can  behave,  = II ieCOMPS(G)Ci- 

Definition  (t-partition):  A t-partition  of  a graph  representa- 
tion is  any  collection  of  vertex  induced  sub-graphs  Si---Sk 
such  that  for  all  i,j  with  1 <i,j  < k,  (T  Sk  C T. 
Definition  (t-size):  The  t-size  of  a sub-graph  in  a t-partition 
of  the  original  graph  is  the  product  of  the  number  of  dif- 
ferent values  each  of  the  t-nodes  it  shares  with  other  sub- 
graphs, can  take.  In  other  words,  suppose  Si---Sk  form  a 
t-partition  of  the  original  graph.  Denote  the  t-nodes  in  each 
of  these  sub-graphs  by  STi  ■ ■ ■ STk . The  t-size  of  Si  is  given 
by UjeSTi(tj\3h,  1 <h<k,h^i,j  e STh). 

Definition  (ct-size):  The  ct-size  of  a graph  is  the  product  of 
its  c-size  and  t-size. 

Given  the  graph  representation  of  a physical  system, 
its  c-size  characterizes  the  size  of  the  search  space  for 
CBFS.  The  general  idea  behind  HCBFS  is  to  reduce  the 
effective  search  space  of  CBFS  using  dynamic  program- 
ming. Suppose  we  were  able  to  divide  the  system  into  two 
subsystems  that  had  components  comp^-  ■ ■ compini  and 
compjt  ■■■compjnrt  such  that  ni  + 112  = \COMPS\.  Now, 
the  search  space  for  each  of  these  two  individual  partitions 
(for  CBFS)  becomes  their  respective  c-sizes.  Calling  them 
Ci  and  C'2  respectively,  we  have  C\.C2  — C (C  is  the  c-size 
of  the  original  graph).  Of  course,  the  search  cannot  simply 
be  done  in  each  of  them  independently  because  of  the  com- 
mon variables  they  share.  However,  we  can  apply  the  idea 
of  dynamic  programming  to  solve  each  of  these  partitions 
for  all  values  of  the  variables  they  share  and  then  “join”  the 
two  results.  If  we  allow  for  the  common  variables  to  be  only 
among  the  t-nodes,  then  the  size  of  the  search  space  becomes 
CiT  + C2T  + T2  (T  is  the  t-size  of  the  common  t-nodes). 
C]T  + C2T  accounts  for  solving  the  sub-problems  for  all 
values  of  the  communication  variables,  and  T 2 accounts  for 
“joining”  them.  It  should  be  noted  however,  that  if  consis- 
tency checks  involve  “simulation”,  then  the  T2  term  tends  to 
be  negligible  (because  search  over  the  join-space  does  not  in- 
volve simulation).  Generalizing  the  above  idea  of  dynamic 
programming,  it  is  also  possible  to  characterize  n-way  splits 
which  partition  the  original  graph  into  n partitions  each  of 
which  share  communication  variables  with  a subset  of  the 
others. 

Definition  (Splitting  Condition):  The  splitting  condition 
holds  for  a t-partition  in  a graph  G if  the  sum  of  the  ct-sizes  of 
the  partitions  and  the  join-size  is  strictly  lesser  than  the  c-size 
of  G. 

To  obtain  maximum  computational  benefits,  we  have  to 
find  a t-partition  that  minimizes  the  sum  of  the  ct-sizes  of 
the  resulting  partitions  and  the  join-size.  This  general  n.-way 
split  is  NP-hard  to  find  (easy  to  prove  from  the  fact  that  find- 
ing the  induced  width  is  NP-hard).  However,  HCBFS  em- 
ploys a heuristic  to  decompose  a large  diagnosis  problem 
into  optimal  sub-problems  based  on  the  topological  struc- 
ture of  the  system.  It  runs  in  polynomial  time  and  is  al- 
ways assured  of  yielding  computational  benefits  (albeit  in 
sub-optimal  amounts).  The  idea  is  to  examine  only  a poly- 
nomial number  of  2-way  splits  and  choose  the  greediest  one 


ALGORITHM  HCBFS  (Graph  G = ( V. , E)) 

T = T- Graph  of  G 

T'  = Partition-Tree  formed  by  m-ordering 
on  moralized  T 
E = Edges  ofT' 

GREEDYSPLIT  (G,  E) 

END  HCBFS 

ALGORITHM  GREEDYSPLIT  (Graph  G, 

Candidate- Splits  B ) 
bk  = BEST-SPLIT  (G,  B) 

IF  (SPLITTING-CONDITION  (G,  bk))  THEN 
(GuG2)  = PARTITION  (G,  bk) 

Bi  = {l>i  bi  is  on  the  same  side  of  bk  as  Gi\ 
B2  = {bi  bi  is  on  the  same  side  of  bk  as  G'2} 
GREEDYSPLIT  (Gu  Bx) 

GREEDYSPLIT  (G2,  B2) 

END  IF 

END  GREEDYSPLIT 


Figure  3:  Hierarchical  Conflict-Directed  Best  First  Search 


Figure  4:  Illustrates  the  working  of  HCBFS  to  produce  sub- 
problems. Thicker  edges  denote  greater  communication  (t- 
size).  P 1 , P2,  P3  are  the  final  partitions.  The  tables  indicate 
the  solutions  to  diagnosis  sub-problems  for  all  values  of  the 
surrounding  communication  variables. 


if  it  satisfies  the  splitting  condition.  Such  a splitting  process 
is  performed  recursively  until  there  is  no  more  apparent  scope 
for  computational  benefits.  Interestingly  enough,  the  candi- 
date t-partitions  that  are  examined  are  themselves  derived  us- 
ing the  m-ordering  heuristics.  Figure  3 presents  the  working 
of  HCBFS;  and  Figure  4 illustrates  its  working  on  a small 
example.  The  following  properties  hold  true  for  the  HCBFS 
algorithm. 

Property  1 : The  edges  of  T'  maintain  the  running  intersec- 
tion property  [Dechter,  1992]  and  hence  the  t-nodes  consti- 
tuting the  communication  variables  on  any  edge  form  a valid 
t-partition. 

Property  2:  Let  the  c-sizes  of  the  final  partitions  be 

C'i  • • • C'k  . The  c-size  of  the  original  graph  is  therefore 
The  first  time  we  partition  6',  it  must  have  been 
the  case  that  (because  of  the  splitting  having  to  be  satisfied) 
I > S x T + T x R (T  is  the  size  of  the  communi- 
cation; S and  R are  the  c-sizes  of  the  two  resulting  partitions 
with  SxR  = II  l^fCi).  In  later  iterations,  the  effective  S and 
R are  only  made  to  decrease  recursively  and  this  essentially 
means  that  HCBFS  is  always  safe  in  producing  computational 
benefits. 

Property  3:  The  total  number  of  splits  considered  is  clearly 
linear  since  they  correspond  to  the  edges  of  T' . Although 
there  are  two  recursive  calls  to  GREEDYSPLIT,  the  can- 
didate set  of  edges  that  enter  them  are  disjoint  and  hence 
GREEDYSPLIT  is  called  only  a linear  number  of  times.  This 
proves  that  the  running  time  of  HCBFS  is  polynomial. 
Property  4:  Choosing  certain  edges  in  a tree  as  splits  results 
in  a set  of  partitions  that  themselves  form  a tree  with  respect 
to  the  split  edges  (as  illustrated  in  Figure  4).  Since  we  know 
that  optimization  in  a tree  structured  network  is  exponential 
in  the  ct-size  of  the  largest  partition,  the  complexity  of  diag- 
nosis using  HCBFS  is  exponential  in  this  parameter. 

4.1  Analysis  and  Implications  of  HCBFS 

We  briefly  delve  into  the  computational  implications  of 
HCBFS.  HCBFS  facilitates  search  in  two  ways.  First,  it  re- 
duces the  effective  search  space  by  using  the  dynamic  pro- 
gramming paradigm.  Second,  it  propagates  “easiness”  in  con- 
straint checking.  Constraint  checking  in  general  may  not  be 
computationally  straightforward  - it  may  often  involve  sys- 
tem “simulation”  of  some  kind  over  an  extended  period  of 
time.  It  can  be  noticed  however,  that  constraint  checking  over 
the  join  space  is  a mere  verification  that  two  selected  rows  of 
the  partition  tables  have  similar  values  for  their  communica- 
tion variables.  By  using  HCBFS,  the  simulation-based  con- 
straint checks  are  “pushed”  to  smaller  parts  of  the  system  (the 
partitions).  Even  for  consistency  checks  that  do  not  involve 
“simulation”,  implementing  a TMS  for  each  small  partition 
is  more  effective  (in  terms  of  the  complexity  of  data  struc- 
tures to  be  maintained)  than  one  large  TMS  for  the  system  as 
a whole. 

HCBFS  not  only  reduces  the  effective  un-amortized  search 
complexity  for  a diagnosis  call,  but  also  reduces  the  amor- 
tized complexity.  The  solutions  to  sub-problems  occurring 
for  diagnosis  calls  made  in  the  past  can  be  stored  and  used  for 
future  diagnosis  calls  when  they  need  to  solve  the  same  sub- 
problems. Eventually,  when  all  sub-problems  for  all  values 


of  communication  variables  have  been  solved  at  least  once,  a 
diagnosis  call  can  be  answered  by  doing  a search  only  over 
the  join-space  of  the  partitions.  This  too  (as  argued  before  ) is 
computationally  easier  than  “simulation”. 

The  dynamic  programming  idea  of  HCBFS  can  further  be 
used  to  pre-process  or  compile  the  system  description  to  fa- 
cilitate diagnosis.  Consider  a partition  of  the  graph  represen- 
tation of  a physical  setting.  The  idea  is  to  solve  the  diagnosis 
problem  for  this  partition  for  all  values  of  the  surro  unding  in- 
termediate variables  ( t-nodes ) and  store  the  results.  We  can 
then  treat  this  partition  as  a single  physical  component  that 
can  take  any  value  (mode)  corresponding  to  a combination 
of  the  values  for  each  of  its  surrounding  t-nodes.  The  as- 
sociated probabilities  would  be  derived  from  the  results  for 
the  corresponding  diagnosis  sub-problems.  This  kind  of  pre- 
compilation of  the  system  to  treat  partitions  as  components 
provides  computational  benefits  only  if  their  t-size  is  lesser 
than  their  c-size  (which  is  often  the  case). 

The  space  complexity  associated  with  HCBFS  has  two 
components.  One  is  the  size  of  the  tables  associated  with 
the  sub-problems.  This  is  referred  to  as  the  table-space  com- 
plexity. It  is  easy  to  observe  that  the  table  space  complexity 
is  equal  to  the  sum  of  the  t-sizes  over  all  partitions.  Another 
component  of  the  space  requirement  is  the  actual  space  re- 
quired for  the  diagnosis  algorithms  to  build  the  tables  and 
compose  them  to  answer  a diagnosis  call.  This  space  require- 
ment is  identical  to  the  running  time  complexities  associated 
with  solving  and  composing  sub-problems.  It  is  worth  not- 
ing that  the  cost  of  implementing  dynamic  programming  in 
HCBFS  is  reflected  only  in  its  table-space  complexity. 

HCBFS  also  leads  to  what  are  called  hybrid  approaches. 
These  are  techniques  that  combine  conflict-based  and 
coverage-based  approaches  [Kumar,  to  appear]  to  solve  sub- 
problems and  combine  their  solutions.  Coverage-based  algo- 
rithms are  those  that  record  conflicts  and  cast  the  diagnosis 
problem  as  a minimum  weight  hitting  set  problem  [Kurien 
and  Nayak,  2000].  Conflict-based  approaches  refer  to  the 
standard  TMS-based  algorithms  like  CBFS  and  QCBFS.  In 
general,  hybrid  approaches  do  the  following:  (1)  Employ 
the  hierarchical  partitioning  algorithm  to  reduce  the  effective 
search  space.  (2)  Employ  one  of  coverage-based  or  conflict- 
based  approaches  for  the  sub-problems  and  the  join  space. 

5 Comparison  with  Related  Work 

Related  work  on  trying  to  leverage  structure  into  the  task  of 
diagnosis  can  be  found  in  [Darwiche,  1998],  [Autio  and  Re- 
iter, 1998],  [Provan,  2001]  etc.  In  [Darwiche,  1998],  negation 
normal  forms  (NNF)  are  used  to  represent  the  consequence 
of  SD  U OBS.  Subsequently,  minimal  cardinality  diagnoses 
are  extracted  from  them  using  a simple  cost  propagation  and 
pruning  algorithm.  For  such  a procedure  to  be  effective,  it  is 
important  to  ensure  the  decomposability  of  the  NNF.  Decom- 
posability  is  achieved  by  partitioning  SD  to  perform  a case 
analysis  on  the  shared  atoms  that  do  not  appear  among  the 
observations.  The  partitioning  choices  are  inspired  by  trying 
to  produce  a join-tree  of  the  topological  structure  of  the  sys- 
tem much  like  the  m-ordering  heuristics.  The  complexity  of 
the  algorithm  is  exponential  in  the  size  of  the  hyper-nodes  of 


the  join  tree  and  linear  in  the  number  of  such  hyper-nodes. 

There  are  at  least  three  important  ways  in  which  this  ap- 
proach differs  from  ours.  Firstly,  this  approach  does  not  rea- 
son about  probabilities  but  rather  looks  for  minimal  diagnoses 
(minimizes  the  number  of  faulty  components).  Secondly  (and 
more  importantly),  it  tries  to  produce  diagnoses  (minimal)  by 
maintaining  at  each  stage,  a representation  for  all  the  consis- 
tent candidates.  The  optimization  phase  (of  producing  mini- 
mal candidates)  occurs  as  a separate  phase.  Usually,  we  are 
not  interested  in  all  consistent  diagnoses  and  trying  to  rep- 
resent them  at  any  stage  when  there  could  potentially  be  an 
exponentially  large  number  of  them  can  be  a bottleneck.  In 
our  approach,  the  optimization  and  satisfaction  phases  are  in- 
terleaved. This  allows  us  to  produce  candidates  as  and  when 
we  want  them,  in  decreasing  order  of  their  optimization  val- 
ues, and  to  prune  the  search  space  using  both  optimality-  and 
satisfiability-reasoning.  Thirdly,  if  the  number  of  intermedi- 
ate variables  is  too  many,  achieving  decomposability  in  the 
NNF  is  exponential  in  the  induced  width  of  the  moralized 
T-Graph ; but  since  we  are  interested  only  in  the  behavior 
modes  of  components  and  not  that  of  intermediate  variables, 
the  search  space  may  be  significantly  reduced  using  our  ap- 
proach when  the  components  are  “tightly”  coupled. 

In  [Provan,  2001]  the  idea  of  hierarchical  diagnosis  has  a 
different  meaning.  It  is  based  on  the  use  of  abstraction  oper- 
ators to  define  an  abstraction  hierarchy  of  the  model  (a  lattice 
induced  by  a set  of  partitions  of  the  system  variables).  A 
group  of  components  and  intermediate  variables  at  a partic- 
ular abstraction  level  are  “merged”  to  form  “abstract”  com- 
ponents at  a higher  level  with  appropriately  defined  inputs, 
outputs  and  constraints  relating  them.  A structural  abstrac- 
tion sc  of  subcomponents  ci  • • • Ck  defines  two  modes  of  be- 
havior for  sc  — AB(sc)  and  -\AB(sc)  with  the  constraint 
that  -iAB(ci) AB(ck)  -A  ~<AB(sc).  Such  an  abstrac- 

tion mechanism  is  useful  only  for  isolating  a group  of  compo- 
nents all  of  which  cannot  be  behaving  in  their  nominal  modes 
(abstract  models  isolate  diagnoses  only  at  the  abstract  level, 
but  more  efficiently).  At  each  level  of  abstraction  we  only 
define  the  nominal  mode  of  behavior  for  the  abstract  compo- 
nent. The  only  other  implicit  mode  is  the  faulty  mode.  This 
limits  the  scope  of  diagnosis  even  at  the  abstract  levels.  Un- 
der a combinatorial  optimization  formulation  of  the  diagnosis 
problem,  abstraction  of  Ci  • • • Ck  to  sc  only  defines  what  hap- 
pens when  all  components  Ci  • • • Ck  are  behaving  in  their  most 
probable  modes  (nominal  mode  for  sc).  It  does  not  say  any- 
thing about  what  probabilities  are  associated  or  what  happens 
with  any  of  the  other  remaining  exponentially  large  number 
of  non-nominal  modes.  This  makes  diagnosis  not  only  in- 
feasible at  more  detailed  levels,  but  also  information-lossy  at 
abstract  levels. 

6 Conclusions 

In  this  paper,  we  employed  the  combinatorial  optimization 
characterization  of  the  diagnosis  problem.  We  compared 
two  different  approaches  that  exploit  different  features  of 
the  problem:  (1)  naive  m-ordering  exploits  the  structure  of 
the  system  by  leveraging  the  causal  dependencies  among  the 
variables  ( T-Graph ) (2)  CBFS  exploits  the  fact  that  the  out- 


put is  uniquely  determined  for  given  inputs  to  a component 
behaving  in  a known  mode,  and  that  we  are  interested  only 
in  an  assignment  to  the  component  modes  of  the  system.  We 
observed  that  naive  m-ordering  performs  poorly  when  there  is 
high  interconnectedness  among  components  and  that  CBFS 
performs  poorly  when  there  is  low  coupling.  We  proposed  a 
computationally  feasible  algorithm  called  FICBFS  (extending 
on  CBFS)  to  achieve  the  best  of  both  the  worlds.  FICBFS  uses 
CBFS  in  tightly  coupled  parts  of  the  system  and  m-ordering 
to  identify  them.  We  showed  that  FICBFS  has  many  important 
implications  on  the  complexity  of  diagnosis  — reduces  the 
un-amortized  complexity  of  a diagnosis  call,  reduces  amor- 
tized complexity  of  a diagnosis  call  by  reusing  computation 
done  for  sub-problems  arising  in  past  diagnosis  calls,  allows 
pre-compilation  of  the  system  description  to  facilitate  diag- 
nosis, and  enhances  hybrid  algorithms.  Finally,  we  compared 
and  contrasted  our  work  with  somewhat  related  approaches. 
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